For example,Бобцов

Sentiment analysis of Arabic tweets using supervised machine learning (in English)

Annotation

The increasing volume of user-generated content on social media platforms necessitates effective tools for understanding public sentiment. This study presents an approach to sentiment analysis of Arabic tweets using supervised machine learning techniques. We explored the performance of three popular algorithms — Support Vector Machines (SVM), Naive Bayes (NB), and Logistic Regression (LR) — on two distinct corpora: the Arabic Sentiment Text Corpus (ASTC) and a dataset of Arabic tweets. Our methodology involved four tests assessing the impact of corpus characteristics, preprocessing techniques, weighting methods, and the use of N-grams on classification accuracy. The first test established that the choice of corpus significantly influences model performance, with SVM showing superior accuracy on the structured ASTC, while NB excelled with the informal Arabic tweets. In the second test, preprocessing steps, including the removal of punctuation and stop-words, led to a noticeable improvement in classification accuracy for the Arabic tweets but had minimal or even negative effects on the ASTC. The third test indicated that incorporating N-grams yielded modest improvements for NB and LR in more structured texts, while its impact on tweets was negligible. Finally, the fourth test compared different weighting techniques, revealing that SVM benefitted from the Term Frequency-Inverse Document Frequency weighting method, while NB performance remained stable regardless of the weighting approach. These findings underscore the importance of tailoring preprocessing and feature extraction strategies to the specific characteristics of the dataset, ultimately enhancing the accuracy of sentiment analysis in Arabic language contexts

Keywords

Articles in current issue